Goto

Collaborating Authors

 gaussian kernel


Finite-Particle Convergence Rates for Conservative and Non-Conservative Drifting Models

arXiv.org Machine Learning

We propose and analyze a conservative drifting method for one-step generative modeling. The method replaces the original displacement-based drifting velocity by a kernel density estimator (KDE)-gradient velocity, namely the difference of the kernel-smoothed data score and the kernel-smoothed model score. This velocity is a gradient field, addressing the non-conservatism issue identified for general displacement-based drifting fields. We prove continuous-time finite-particle convergence bounds for the conservative method on $\R^d$: a joint-entropy identity yields bounds for the empirical Stein drift, the smoothed Fisher discrepancy of the KDE, and the squared center velocity. The main finite-particle correction is a reciprocal-KDE self-interaction term, and we give deterministic and high-probability local-occupancy conditions under which this term is controlled. We keep the quadrature constants explicit and track their possible bandwidth dependence: the root residual-velocity rate $N^{-1/(d+4)}$ holds under an additional $h$-uniform quadrature regularity condition, while a more general growth condition yields the optimized root rate $N^{-(2-ฮฒ)/(2(d+4-ฮฒ))}$, where $0\le ฮฒ<2$. We also analyze the non-conservative drifting method with Laplace kernel, corresponding to the original displacement-based velocity proposed in Deng et al., 2026 (arxiv:2602.04770). For this method, a sharp companion kernel decomposes the velocity into a positive scalar preconditioning of a sharp-score mismatch plus a Laplace scale-mismatch residual, producing an analogous finite-particle rate with an unavoidable residual term. Finally, we explain how the continuous-time residual-velocity bounds translate into one-step generation guarantees through the explicit drift size $ฮท$.


Multifidelity Gaussian process regression for solving nonlinear partial differential equations

arXiv.org Machine Learning

Solving nonlinear partial differential equations (PDEs) using kernel methods offers a compelling alternative to traditional numerical solvers. However, the performance of these methods strongly depends on the choice of kernel. In this work, as the available information is inherently multifidelity, we propose a kernel learning approach based on cokriging, leveraging empirical information from multifidelity simulations. In the first step, we fit a differentiable non-stationary kernel to an empirical kernel obtained from low-fidelity simulations. In the second step, we derive a high-fidelity kernel with estimated hyperparameters, and construct a corresponding high-fidelity mean using the multifidelity framework. These components can then be used within a Gaussian process framework for solving PDEs. Finally, we demonstrate the performance of the proposed physics-informed method on the Burgers' equation.



Boosting Spectral Clustering on Incomplete Data via Kernel Correction and Affinity Learning

Neural Information Processing Systems

Spectral clustering has gained popularity for clustering non-convex data due to its simplicity and effectiveness. It is essential to construct a similarity graph using a high-quality affinity measure that models the local neighborhood relations among the data samples. However, incomplete data can lead to inaccurate affinity measures, resulting in degraded clustering performance. To address these issues, we propose an imputation-free framework with two novel approaches to improve spectral clustering on incomplete data. Firstly, we introduce a new kernel correction method that enhances the quality of the kernel matrix estimated on incomplete data with a theoretical guarantee, benefiting classical spectral clustering on pre-defined kernels. Secondly, we develop a series of affinity learning methods that equip the selfexpressive framework with โ„“p-norm to construct an intrinsic affinity matrix with an adaptive extension. Our methods outperform existing data imputation and distance calibration techniques on benchmark datasets, offering a promising solution to spectral clustering on incomplete data in various real-world applications.


Appendix for based Test of Independence for Cluster correlated Data Contents

Neural Information Processing Systems

In this section, we present some preliminary results that will be useful in proving Theorem 3.2, Theorem 3.3 and Proposition 3.4. We draw upon existing theory on properties of random kernel matrices and extend these properties to cluster-correlated data. Specifically, we show the convergence of eigenvalues and eigenvectors of an empirical kernel matrix based on clustered data. Let (X,F,P) be a probability space and H be a Hilbert space over (X,F,P) with a symmetric kernel function k: X X R. Let H be a compact operator on H, defined by Hg(x) = Z Equivalently, Hn can be viewed as an n nreal matrix whose (i,j)-th entry is {Hn}i,j = 1 n k(Xi,Xj). This is the empirical kernel matrix scaled by a factor of 1/n. Here we restrict our discussion to a reproducing kernel Hilbert space (RKHS) H, where the kernel function k is positive semi-definite. We also assume that the operator H is Hilbert-Schmidt, with E[k2(X,X0)] < . Let ฮป(T) denote the spectrum of a compact, symmetric operator T. Then ฮป(H) and ฮป(Hn) are the sets of eigenvalues for H and Hn, respectively.


AKernel-based Test of Independence for Cluster-correlated Data

Neural Information Processing Systems

The Hilbert-Schmidt Independence Criterion (HSIC) is a powerful kernel-based statistic for assessing the generalized dependence between two multivariate variables. However, independence testing based on the HSIC is not directly possible for cluster-correlated data. Such a correlation pattern among the observations arises in many practical situations, e.g., family-based and longitudinal data, and requires proper accommodation. Therefore, we propose a novel HSIC-based independence test to evaluate the dependence between two multivariate variables based on clustercorrelated data. Using the previously proposed empirical HSIC as our test statistic, we derive its asymptotic distribution under the null hypothesis of independence between the two variables but in the presence of sample correlation. Based on both simulation studies and real data analysis, we show that, with clustered data, our approach effectively controls type I error and has a higher statistical power than competing methods.



Appendix: Remodel Self-Attention with Gaussian Kernel and Nystrรถm Method

Neural Information Processing Systems

Y-axis: Cross Entropy Loss on validation set. Figure 1 shows the validation loss changes with respect to training time for 50k steps as supplementary results for the experiments in Section 5. In general, Skyformer converges faster and finishes 50k steps earlier than vanilla Attention and Kernelized Attention over all tasks. We further remark that on Text Classification, all models quickly fall into over-fitting, and thus the validation losses rise quickly. On Pathfinder, due to the difficulty of training, in the trial shown in the figure vanilla Attention fails to reach the best long-time limit under a certain setting. Figure 2 shows the singular value distribution of attention output from the second layer of a trained vanilla transformer.



An Empirical Study on The Properties of Random Bases for Kernel Methods

Neural Information Processing Systems

Kernel machines as well as neural networks possess universal function approximation properties. Nevertheless in practice their ways of choosing the appropriate function class differ. Specifically neural networks learn a representation by adapting their basis functions to the data and the task at hand, while kernel methods typically use a basis that is not adapted during training. In this work, we contrast random features of approximated kernel machines with learned features of neural networks. Our analysis reveals how these random and adaptive basis functions affect the quality of learning. Furthermore, we present basis adaptation schemes that allow for a more compact representation, while retaining the generalization properties of kernel machines.